fix(llm): improve graph JSON parsing robustness for LLM outputs by linmengmeng-1314 · Pull Request #332 · apache/hugegraph-ai

linmengmeng-1314 · 2026-05-18T11:57:36Z

Summary

Improve _extract_and_filter_label to handle varying LLM output formats
Strip markdown code blocks before JSON extraction
Support both {"vertices":[...], "edges":[...]} (object) and flat array formats
Auto-convert flat arrays to the expected object structure

Problem

When using reasoning models (e.g., DeepSeek V4) for graph extraction, the LLM may return:

JSON wrapped in markdown code blocks (\``json ... ```), which breaks the greedy regex ({.*})`
A flat array [vertex, edge, ...] instead of the expected object {"vertices": [...], "edges": [...]}

Both cases cause json.JSONDecodeError and result in empty extraction output even though the LLM correctly identified entities and relationships.

Solution

Strip markdown code fences (\``json/````) before regex matching
Update regex to match both objects ({...}) and arrays ([...])
When a flat array is detected, partition items by type field into vertices and edges

Test plan

Test with OpenAI models (existing behavior should be preserved)
Test with DeepSeek models (markdown-wrapped array format)
Test with Ollama models
Verify both object and array formats are handled correctly

🤖 Generated with Claude Code

…tputs Different LLMs return graph extraction results in varying formats: - Some wrap JSON in markdown code blocks (```json ... ```) - Some return a flat array of vertices/edges instead of a structured object This causes json.JSONDecodeError when the greedy regex ({.*}) captures invalid content from markdown-wrapped or array-formatted responses. Changes: - Strip markdown code blocks before JSON extraction - Support both object ({...}) and array ([...]) JSON formats - Auto-convert flat arrays to {"vertices": [...], "edges": [...]} format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- cover markdown fenced property graph JSON output - cover flat array vertex and edge parsing - keep tests scoped to _extract_and_filter_label behavior

- add coverage for fenced JSON with prose around it - verify flat arrays drop invalid graph items - cover malformed fenced JSON returning no graph items

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working labels May 18, 2026

linmengmeng-1314 mentioned this pull request May 18, 2026

LLM output format compatibility issues with reasoning models (DeepSeek, etc.) #333

Closed

github-actions Bot added the llm label May 18, 2026

imbajin requested a review from Copilot May 18, 2026 13:31

Copilot started reviewing on behalf of imbajin May 18, 2026 13:31 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

huaun-develop and others added 3 commits May 19, 2026 09:26

style: fix ruff format for property_graph_extract.py

f6f2825

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(graph): add property graph parser regressions

a1e51ae

- cover markdown fenced property graph JSON output - cover flat array vertex and edge parsing - keep tests scoped to _extract_and_filter_label behavior

fix(graph): cover graph parser edge cases

9ef1e2d

- add coverage for fenced JSON with prose around it - verify flat arrays drop invalid graph items - cover malformed fenced JSON returning no graph items

imbajin requested a review from Copilot May 19, 2026 04:53

Copilot started reviewing on behalf of imbajin May 19, 2026 04:53 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

imbajin requested a review from Copilot May 19, 2026 07:42

Copilot started reviewing on behalf of imbajin May 19, 2026 07:42 View session

imbajin approved these changes May 19, 2026

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label May 19, 2026

imbajin changed the title ~~fix(graph): improve property graph JSON parsing robustness for LLM outputs~~ fix(llm): improve graph JSON parsing robustness for LLM outputs May 19, 2026

imbajin merged commit 016158f into apache:main May 19, 2026
13 checks passed

linmengmeng-1314 review requested due to automatic review settings May 19, 2026 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): improve graph JSON parsing robustness for LLM outputs#332

fix(llm): improve graph JSON parsing robustness for LLM outputs#332
imbajin merged 4 commits into
apache:mainfrom
linmengmeng-1314:fix/graph-extract-json-parsing

linmengmeng-1314 commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

linmengmeng-1314 commented May 18, 2026

Summary

Problem

Solution

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants